{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "7bdf3dd5-6a3f-40f2-9601-aa061b34ef17",
   "metadata": {},
   "source": [
    "#  KNN （k,nearest, Neighbors）分类算法，回归算法\n",
    "即“K个最近的邻居”"
   ]
  },
  {
   "cell_type": "raw",
   "id": "bcaafd6d-003b-4b62-979c-dc8fe56de8ef",
   "metadata": {},
   "source": [
    "KNN:\n",
    "    选定参数K，选取K个距离最近的“邻居”，进行投\n",
    "    票\n",
    "    使用百分比归一化，使各个变量的影响相同"
   ]
  },
  {
   "cell_type": "raw",
   "id": "effd4d2f-01c7-4945-bd95-5f2a96303145",
   "metadata": {},
   "source": [
    "# 必考\n",
    "准确率(正确率)Accuracy:预测正确的样本/总样本\n",
    "精确率(查准率)Precision：将真预测为真的样本/预测为真的样本\n",
    "召回率(查全率)Recall：将真预测为真的样本/真样本"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "9891eb46-caf3-481b-b155-7dcf3d9be0a9",
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn. neighbors import KNeighborsClassifier \n",
    "\n",
    "knn = KNeighborsClassifier(n_neighbors=5, weights='uniform', algorithm='auto', leaf_size=30,\n",
    "                           p=2, metric='minkowski', metric_params=None, n_jobs=1)\n"
   ]
  },
  {
   "cell_type": "raw",
   "id": "9f010375-f4b5-499c-9e07-e8544d9d73ae",
   "metadata": {},
   "source": [
    "一些重要的参数：\n",
    "n_neighbors  ---k近邻的k值，默认值是5\n",
    "weights  ---预测时邻居的权重， uniform表示邻居权重一样大，distance表示距离小的权重大\n",
    "metric ---距离计算公式， ‘minkowski’闵氏距离\n",
    "p ---闵氏距离的参数，p=2就是欧式距离，p=1就是曼哈顿距离\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d2b14969-e595-4636-a97d-92d31df2017f",
   "metadata": {},
   "source": [
    "### 准备数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "8fdd5043-e1c3-40f4-a615-eda240617383",
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn import datasets\n",
    "# 导入sklearn自带的iris数据集\n",
    "iris = datasets.load_iris()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "2ac008cd-2d85-4853-82d1-f8cc5c346b69",
   "metadata": {},
   "outputs": [],
   "source": [
    "#iris"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "014150cf-1f91-4c7b-b4a4-eba2d9bd8d07",
   "metadata": {},
   "outputs": [],
   "source": [
    "#iris.data   # 同iris[data]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "b365ee38-1dbe-461c-aab1-3100d1c0ee8f",
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.model_selection import train_test_split\n",
    "# model_selection 模型选择       train_test_split 训练-测试-数据分割\n",
    "\n",
    "X_train,X_test,y_train,y_test=train_test_split(iris.data,iris.target,test_size=0.3,random_state=1024)  \n",
    "# 从data,target分出x.y的测试和训练量，其中测试量为30%，形成X_train,X_test,y_train,y_test四个合集\n",
    "# test_size=0.3  测试数据比例0.3        random_state=1024  均匀和随机选择，可复现（相同1024和相同data可以得到相同随机量）"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e7d712c3-e719-4a8b-8f3a-29d20b3f8052",
   "metadata": {},
   "source": [
    "### 模型训练"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "f8810e59-2be1-4e2a-bf38-0f189d6f6086",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<style>#sk-container-id-2 {color: black;background-color: white;}#sk-container-id-2 pre{padding: 0;}#sk-container-id-2 div.sk-toggleable {background-color: white;}#sk-container-id-2 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-container-id-2 label.sk-toggleable__label-arrow:before {content: \"▸\";float: left;margin-right: 0.25em;color: #696969;}#sk-container-id-2 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-container-id-2 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-container-id-2 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-container-id-2 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-container-id-2 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-container-id-2 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: \"▾\";}#sk-container-id-2 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-2 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-2 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-container-id-2 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-container-id-2 div.sk-estimator:hover {background-color: #d4ebff;}#sk-container-id-2 div.sk-parallel-item::after {content: \"\";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-container-id-2 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-2 div.sk-serial::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: 0;}#sk-container-id-2 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;position: relative;}#sk-container-id-2 div.sk-item {position: relative;z-index: 1;}#sk-container-id-2 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;position: relative;}#sk-container-id-2 div.sk-item::before, #sk-container-id-2 div.sk-parallel-item::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: -1;}#sk-container-id-2 div.sk-parallel-item {display: flex;flex-direction: column;z-index: 1;position: relative;background-color: white;}#sk-container-id-2 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-container-id-2 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-container-id-2 div.sk-parallel-item:only-child::after {width: 0;}#sk-container-id-2 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;}#sk-container-id-2 div.sk-label label {font-family: monospace;font-weight: bold;display: inline-block;line-height: 1.2em;}#sk-container-id-2 div.sk-label-container {text-align: center;}#sk-container-id-2 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-container-id-2 div.sk-text-repr-fallback {display: none;}</style><div id=\"sk-container-id-2\" class=\"sk-top-container\"><div class=\"sk-text-repr-fallback\"><pre>KNeighborsClassifier()</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class=\"sk-container\" hidden><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-2\" type=\"checkbox\" checked><label for=\"sk-estimator-id-2\" class=\"sk-toggleable__label sk-toggleable__label-arrow\">KNeighborsClassifier</label><div class=\"sk-toggleable__content\"><pre>KNeighborsClassifier()</pre></div></div></div></div></div>"
      ],
      "text/plain": [
       "KNeighborsClassifier()"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from sklearn.neighbors import KNeighborsClassifier\n",
    "\n",
    "# 实例化模型并训练\n",
    "model = KNeighborsClassifier()\n",
    "\n",
    "# 使用训练集训练模型\n",
    "model.fit(X_train,y_train)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "171c35f3-3661-4065-95e0-0d1a4a29c01a",
   "metadata": {},
   "source": [
    "### Step4：模型评估\n",
    "针对分类模型，有很多评估方法，常用的有\n",
    "- 误分类矩阵 confusion_matrix\n",
    "- 分类报告 classification_report\n",
    "- ROC曲线和AUC值"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c50c0611-6eb0-47a8-a648-b56668400c33",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "f6c2d8ce-a066-4f49-b6be-2143ee32f5c2",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.9555555555555556\n",
      "              precision    recall  f1-score   support\n",
      "\n",
      "           0       1.00      1.00      1.00        14\n",
      "           1       1.00      0.88      0.93        16\n",
      "           2       0.88      1.00      0.94        15\n",
      "\n",
      "    accuracy                           0.96        45\n",
      "   macro avg       0.96      0.96      0.96        45\n",
      "weighted avg       0.96      0.96      0.96        45\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# 方法1：直接使用模型的score方法计算正确率\n",
    "print(model.score(X_test,y_test))\n",
    "\n",
    "# 方法2：使用sklearn.metrics下的classification_report方法\n",
    "# 先对测试集进行预测\n",
    "y_pred = model.predict(X_test) #预测类别标签\n",
    "y_pred_prob = model.predict_proba(X_test) #预测类别概率\n",
    "\n",
    "# 分类评估报告classification_report\n",
    "from sklearn.metrics import classification_report\n",
    "print(classification_report(y_test,y_pred))\n",
    "\n",
    "# precision  正确率        recall 召回率     support  此类别的样本数量\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5c6a39e7-02fe-432d-b462-244b95866834",
   "metadata": {},
   "source": [
    "# 作业"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "8d4b2548-851b-427a-b8c3-77af6d5dd55a",
   "metadata": {},
   "outputs": [
    {
     "ename": "SyntaxError",
     "evalue": "invalid character '，' (U+FF0C) (857992149.py, line 1)",
     "output_type": "error",
     "traceback": [
      "\u001b[1;36m  Cell \u001b[1;32mIn[25], line 1\u001b[1;36m\u001b[0m\n\u001b[1;33m    pandas读取文本文件，比例37，30测试。把剩下文本变为一个datasets，用KNN处理\u001b[0m\n\u001b[1;37m                ^\u001b[0m\n\u001b[1;31mSyntaxError\u001b[0m\u001b[1;31m:\u001b[0m invalid character '，' (U+FF0C)\n"
     ]
    }
   ],
   "source": [
    "pandas读取文本文件，比例37，30测试。把剩下文本变为一个datasets，用KNN处理"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "3695ae1b-5ec7-471c-8513-8fd799cb1c1f",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "df=pd.read_csv('wine.txt')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "id": "467bd106-f742-4e15-80df-24730cf632ae",
   "metadata": {},
   "outputs": [],
   "source": [
    "# 提取第一列\n",
    "df_target = df.iloc[:, 0]  # iloc[:, 0] 表示选择所有行的第一列\n",
    "df_target\n",
    "\n",
    "# 删去df中的第一列\n",
    "df.drop(['1'],axis=1,inplace=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "id": "7e4d6847-bd14-488c-bcc2-34de78a989f5",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>14.23</th>\n",
       "      <th>1.71</th>\n",
       "      <th>2.43</th>\n",
       "      <th>15.6</th>\n",
       "      <th>127</th>\n",
       "      <th>2.8</th>\n",
       "      <th>3.06</th>\n",
       "      <th>.28</th>\n",
       "      <th>2.29</th>\n",
       "      <th>5.64</th>\n",
       "      <th>1.04</th>\n",
       "      <th>3.92</th>\n",
       "      <th>1065</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>13.20</td>\n",
       "      <td>1.78</td>\n",
       "      <td>2.14</td>\n",
       "      <td>11.2</td>\n",
       "      <td>100</td>\n",
       "      <td>2.65</td>\n",
       "      <td>2.76</td>\n",
       "      <td>0.26</td>\n",
       "      <td>1.28</td>\n",
       "      <td>4.38</td>\n",
       "      <td>1.05</td>\n",
       "      <td>3.40</td>\n",
       "      <td>1050</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>13.16</td>\n",
       "      <td>2.36</td>\n",
       "      <td>2.67</td>\n",
       "      <td>18.6</td>\n",
       "      <td>101</td>\n",
       "      <td>2.80</td>\n",
       "      <td>3.24</td>\n",
       "      <td>0.30</td>\n",
       "      <td>2.81</td>\n",
       "      <td>5.68</td>\n",
       "      <td>1.03</td>\n",
       "      <td>3.17</td>\n",
       "      <td>1185</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>14.37</td>\n",
       "      <td>1.95</td>\n",
       "      <td>2.50</td>\n",
       "      <td>16.8</td>\n",
       "      <td>113</td>\n",
       "      <td>3.85</td>\n",
       "      <td>3.49</td>\n",
       "      <td>0.24</td>\n",
       "      <td>2.18</td>\n",
       "      <td>7.80</td>\n",
       "      <td>0.86</td>\n",
       "      <td>3.45</td>\n",
       "      <td>1480</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>13.24</td>\n",
       "      <td>2.59</td>\n",
       "      <td>2.87</td>\n",
       "      <td>21.0</td>\n",
       "      <td>118</td>\n",
       "      <td>2.80</td>\n",
       "      <td>2.69</td>\n",
       "      <td>0.39</td>\n",
       "      <td>1.82</td>\n",
       "      <td>4.32</td>\n",
       "      <td>1.04</td>\n",
       "      <td>2.93</td>\n",
       "      <td>735</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>14.20</td>\n",
       "      <td>1.76</td>\n",
       "      <td>2.45</td>\n",
       "      <td>15.2</td>\n",
       "      <td>112</td>\n",
       "      <td>3.27</td>\n",
       "      <td>3.39</td>\n",
       "      <td>0.34</td>\n",
       "      <td>1.97</td>\n",
       "      <td>6.75</td>\n",
       "      <td>1.05</td>\n",
       "      <td>2.85</td>\n",
       "      <td>1450</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>172</th>\n",
       "      <td>13.71</td>\n",
       "      <td>5.65</td>\n",
       "      <td>2.45</td>\n",
       "      <td>20.5</td>\n",
       "      <td>95</td>\n",
       "      <td>1.68</td>\n",
       "      <td>0.61</td>\n",
       "      <td>0.52</td>\n",
       "      <td>1.06</td>\n",
       "      <td>7.70</td>\n",
       "      <td>0.64</td>\n",
       "      <td>1.74</td>\n",
       "      <td>740</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>173</th>\n",
       "      <td>13.40</td>\n",
       "      <td>3.91</td>\n",
       "      <td>2.48</td>\n",
       "      <td>23.0</td>\n",
       "      <td>102</td>\n",
       "      <td>1.80</td>\n",
       "      <td>0.75</td>\n",
       "      <td>0.43</td>\n",
       "      <td>1.41</td>\n",
       "      <td>7.30</td>\n",
       "      <td>0.70</td>\n",
       "      <td>1.56</td>\n",
       "      <td>750</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>174</th>\n",
       "      <td>13.27</td>\n",
       "      <td>4.28</td>\n",
       "      <td>2.26</td>\n",
       "      <td>20.0</td>\n",
       "      <td>120</td>\n",
       "      <td>1.59</td>\n",
       "      <td>0.69</td>\n",
       "      <td>0.43</td>\n",
       "      <td>1.35</td>\n",
       "      <td>10.20</td>\n",
       "      <td>0.59</td>\n",
       "      <td>1.56</td>\n",
       "      <td>835</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>175</th>\n",
       "      <td>13.17</td>\n",
       "      <td>2.59</td>\n",
       "      <td>2.37</td>\n",
       "      <td>20.0</td>\n",
       "      <td>120</td>\n",
       "      <td>1.65</td>\n",
       "      <td>0.68</td>\n",
       "      <td>0.53</td>\n",
       "      <td>1.46</td>\n",
       "      <td>9.30</td>\n",
       "      <td>0.60</td>\n",
       "      <td>1.62</td>\n",
       "      <td>840</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>176</th>\n",
       "      <td>14.13</td>\n",
       "      <td>4.10</td>\n",
       "      <td>2.74</td>\n",
       "      <td>24.5</td>\n",
       "      <td>96</td>\n",
       "      <td>2.05</td>\n",
       "      <td>0.76</td>\n",
       "      <td>0.56</td>\n",
       "      <td>1.35</td>\n",
       "      <td>9.20</td>\n",
       "      <td>0.61</td>\n",
       "      <td>1.60</td>\n",
       "      <td>560</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>177 rows × 13 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "     14.23  1.71  2.43  15.6  127   2.8  3.06   .28  2.29   5.64  1.04  3.92  \\\n",
       "0    13.20  1.78  2.14  11.2  100  2.65  2.76  0.26  1.28   4.38  1.05  3.40   \n",
       "1    13.16  2.36  2.67  18.6  101  2.80  3.24  0.30  2.81   5.68  1.03  3.17   \n",
       "2    14.37  1.95  2.50  16.8  113  3.85  3.49  0.24  2.18   7.80  0.86  3.45   \n",
       "3    13.24  2.59  2.87  21.0  118  2.80  2.69  0.39  1.82   4.32  1.04  2.93   \n",
       "4    14.20  1.76  2.45  15.2  112  3.27  3.39  0.34  1.97   6.75  1.05  2.85   \n",
       "..     ...   ...   ...   ...  ...   ...   ...   ...   ...    ...   ...   ...   \n",
       "172  13.71  5.65  2.45  20.5   95  1.68  0.61  0.52  1.06   7.70  0.64  1.74   \n",
       "173  13.40  3.91  2.48  23.0  102  1.80  0.75  0.43  1.41   7.30  0.70  1.56   \n",
       "174  13.27  4.28  2.26  20.0  120  1.59  0.69  0.43  1.35  10.20  0.59  1.56   \n",
       "175  13.17  2.59  2.37  20.0  120  1.65  0.68  0.53  1.46   9.30  0.60  1.62   \n",
       "176  14.13  4.10  2.74  24.5   96  2.05  0.76  0.56  1.35   9.20  0.61  1.60   \n",
       "\n",
       "     1065  \n",
       "0    1050  \n",
       "1    1185  \n",
       "2    1480  \n",
       "3     735  \n",
       "4    1450  \n",
       "..    ...  \n",
       "172   740  \n",
       "173   750  \n",
       "174   835  \n",
       "175   840  \n",
       "176   560  \n",
       "\n",
       "[177 rows x 13 columns]"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "id": "f5ca6585-5c4d-4010-b4db-f21ea528eb75",
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.model_selection import train_test_split\n",
    "\n",
    "# model_selection 模型选择       train_test_split 训练-测试-数据分割\n",
    "\n",
    "\n",
    "X_train,X_test,y_train,y_test=train_test_split(df,df_target,test_size=0.3,random_state=1024)  \n",
    "\n",
    "# 从data,target分出x.y的测试和训练量，其中测试量为30%，形成X_train,X_test,y_train,y_test四个合集\n",
    "# test_size=0.3  测试数据比例0.3        random_state=1024  均匀和随机选择，可复现（相同1024和相同data可以得到相同随机量）"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "id": "7eef39ee-6be0-4596-999c-b19642da3cab",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(54, 13)"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X_train.shape\n",
    "X_test.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "id": "0059562a-cec1-4a92-be66-33530cc19b0c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<style>#sk-container-id-3 {color: black;background-color: white;}#sk-container-id-3 pre{padding: 0;}#sk-container-id-3 div.sk-toggleable {background-color: white;}#sk-container-id-3 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-container-id-3 label.sk-toggleable__label-arrow:before {content: \"▸\";float: left;margin-right: 0.25em;color: #696969;}#sk-container-id-3 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-container-id-3 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-container-id-3 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-container-id-3 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-container-id-3 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-container-id-3 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: \"▾\";}#sk-container-id-3 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-3 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-3 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-container-id-3 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-container-id-3 div.sk-estimator:hover {background-color: #d4ebff;}#sk-container-id-3 div.sk-parallel-item::after {content: \"\";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-container-id-3 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-3 div.sk-serial::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: 0;}#sk-container-id-3 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;position: relative;}#sk-container-id-3 div.sk-item {position: relative;z-index: 1;}#sk-container-id-3 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;position: relative;}#sk-container-id-3 div.sk-item::before, #sk-container-id-3 div.sk-parallel-item::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: -1;}#sk-container-id-3 div.sk-parallel-item {display: flex;flex-direction: column;z-index: 1;position: relative;background-color: white;}#sk-container-id-3 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-container-id-3 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-container-id-3 div.sk-parallel-item:only-child::after {width: 0;}#sk-container-id-3 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;}#sk-container-id-3 div.sk-label label {font-family: monospace;font-weight: bold;display: inline-block;line-height: 1.2em;}#sk-container-id-3 div.sk-label-container {text-align: center;}#sk-container-id-3 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-container-id-3 div.sk-text-repr-fallback {display: none;}</style><div id=\"sk-container-id-3\" class=\"sk-top-container\"><div class=\"sk-text-repr-fallback\"><pre>KNeighborsClassifier()</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class=\"sk-container\" hidden><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-3\" type=\"checkbox\" checked><label for=\"sk-estimator-id-3\" class=\"sk-toggleable__label sk-toggleable__label-arrow\">KNeighborsClassifier</label><div class=\"sk-toggleable__content\"><pre>KNeighborsClassifier()</pre></div></div></div></div></div>"
      ],
      "text/plain": [
       "KNeighborsClassifier()"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from sklearn.neighbors import KNeighborsClassifier\n",
    "\n",
    "# 实例化模型\n",
    "model = KNeighborsClassifier()\n",
    "\n",
    "# 使用训练集训练模型\n",
    "model.fit(X_train,y_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "id": "0d783381-2d8e-4ddb-996f-dd5d35824c0c",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.8148148148148148\n",
      "              precision    recall  f1-score   support\n",
      "\n",
      "           1       0.95      1.00      0.97        18\n",
      "           2       0.86      0.75      0.80        24\n",
      "           3       0.57      0.67      0.62        12\n",
      "\n",
      "    accuracy                           0.81        54\n",
      "   macro avg       0.79      0.81      0.80        54\n",
      "weighted avg       0.82      0.81      0.82        54\n",
      "\n"
     ]
    }
   ],
   "source": [
    "# 方法1：直接使用模型的score方法计算正确率\n",
    "print(model.score(X_test,y_test))\n",
    "\n",
    "# 方法2：使用sklearn.metrics下的classification_report方法\n",
    "# 先对测试集进行预测\n",
    "y_pred = model.predict(X_test) #预测类别标签\n",
    "y_pred_prob = model.predict_proba(X_test) #预测类别概率\n",
    "\n",
    "# 分类评估报告classification_report\n",
    "from sklearn.metrics import classification_report\n",
    "print(classification_report(y_test,y_pred))\n",
    "\n",
    "# precision  正确率        recall 召回率     support  此类别的样本数量"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "id": "a5fdf417-76f2-456a-91a6-63b8d0dd6646",
   "metadata": {},
   "outputs": [],
   "source": [
    "# 回忆教科书第二章，KD数凭兴趣，\n",
    "# 做12习题如图，P37"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ee784ffa-7bba-4e3f-a834-1a79a5e4b77c",
   "metadata": {},
   "source": [
    "1. k近邻法的三要素是什么?\n",
    "2. k近邻法的特点是什么?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "39eef08f-a109-4d60-8641-567285b22a4d",
   "metadata": {},
   "source": [
    "1. **k近邻法（K-Nearest Neighbors, KNN）的三要素**：\n",
    "   - 距离度量：KNN算法中首先要确定一个距离度量方式，用来计算未知样本与已知样本之间的距离。常用的距离度量包括欧氏距离、曼哈顿距离、闵可夫斯基距离。\n",
    "   - K值的选择：K值即邻居的数量，它是一个参数，需要在算法运行前确定。K值的选择对分类结果有很大影响，较小的K值容易受到噪声数据的影响，较大的K值则可能导致模型过于平滑，无法捕捉数据的局部特征。\n",
    "   - 分类决策规则：在找到K个最近邻之后，需要一个规则来决定新样本的分类。对于分类问题，常见的规则是多数投票法，即看K个最近邻中哪个类别的数量最多，就将新样本归为该类别。\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fc6943f2-665e-412d-ae21-e747392045b3",
   "metadata": {},
   "source": [
    "2. **k近邻法的特点**：\n",
    "   - 简单易懂：KNN算法的原理简单直观，易于理解和实现。\n",
    "   - 无需训练：KNN是一种惰性学习算法，它不需要在训练阶段构建模型，而是直接在预测阶段使用训练数据集。\n",
    "   - 懒惰性：由于KNN在预测时才进行计算，因此它不会从训练数据中学习一个判别函数，而是“懒惰”地在需要做出决策时才计算。\n",
    "   - 对数据敏感：KNN对训练数据非常敏感，因为它需要存储整个训练集，并且对训练集中的每个样本进行距离计算。\n",
    "   - 易受噪声影响：KNN对异常值和噪声很敏感，因为距离度量会将这些点纳入考虑。\n",
    "   - 计算成本高：在大数据集上，KNN可能需要较长的时间来计算新样本与所有训练样本之间的距离。\n",
    "   - 空间复杂度高：由于需要存储整个训练集，KNN的空间复杂度较高。\n",
    "   - 非概率性：KNN只是根据最近邻的类别来做出预测，不提供概率解释。\n",
    "   - 可扩展性：KNN可以用于分类和回归问题，并且可以处理多类问题。"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7793c643-5ad1-485e-8a17-2762b2d46876",
   "metadata": {},
   "source": [
    "3. **已知标签 labels = ['A','B','C','D’], 对应的数据组为([ [1.0,1.1], [2.0,2.0], [0,0], [4.1,5.1] ]), 利用k近邻法对其进行近邻分类**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "id": "a3f7f794-9acd-41f1-bb0e-fd28f4b95791",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "labels=np.array(['A','B','C','D'])\n",
    "datas=np.array([ [1.0,1.1], [2.0,2.0], [0,0], [4.1,5.1] ])\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "id": "7c81c7ab-86d4-4e51-9511-be522ea91826",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<style>#sk-container-id-5 {color: black;background-color: white;}#sk-container-id-5 pre{padding: 0;}#sk-container-id-5 div.sk-toggleable {background-color: white;}#sk-container-id-5 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-container-id-5 label.sk-toggleable__label-arrow:before {content: \"▸\";float: left;margin-right: 0.25em;color: #696969;}#sk-container-id-5 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-container-id-5 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-container-id-5 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-container-id-5 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-container-id-5 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-container-id-5 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: \"▾\";}#sk-container-id-5 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-5 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-5 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-container-id-5 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-container-id-5 div.sk-estimator:hover {background-color: #d4ebff;}#sk-container-id-5 div.sk-parallel-item::after {content: \"\";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-container-id-5 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-5 div.sk-serial::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: 0;}#sk-container-id-5 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;position: relative;}#sk-container-id-5 div.sk-item {position: relative;z-index: 1;}#sk-container-id-5 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;position: relative;}#sk-container-id-5 div.sk-item::before, #sk-container-id-5 div.sk-parallel-item::before {content: \"\";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: -1;}#sk-container-id-5 div.sk-parallel-item {display: flex;flex-direction: column;z-index: 1;position: relative;background-color: white;}#sk-container-id-5 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-container-id-5 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-container-id-5 div.sk-parallel-item:only-child::after {width: 0;}#sk-container-id-5 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;}#sk-container-id-5 div.sk-label label {font-family: monospace;font-weight: bold;display: inline-block;line-height: 1.2em;}#sk-container-id-5 div.sk-label-container {text-align: center;}#sk-container-id-5 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-container-id-5 div.sk-text-repr-fallback {display: none;}</style><div id=\"sk-container-id-5\" class=\"sk-top-container\"><div class=\"sk-text-repr-fallback\"><pre>KNeighborsClassifier(n_neighbors=1)</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class=\"sk-container\" hidden><div class=\"sk-item\"><div class=\"sk-estimator sk-toggleable\"><input class=\"sk-toggleable__control sk-hidden--visually\" id=\"sk-estimator-id-5\" type=\"checkbox\" checked><label for=\"sk-estimator-id-5\" class=\"sk-toggleable__label sk-toggleable__label-arrow\">KNeighborsClassifier</label><div class=\"sk-toggleable__content\"><pre>KNeighborsClassifier(n_neighbors=1)</pre></div></div></div></div></div>"
      ],
      "text/plain": [
       "KNeighborsClassifier(n_neighbors=1)"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from sklearn.neighbors import KNeighborsClassifier\n",
    "\n",
    "# 实例化模型\n",
    "model = KNeighborsClassifier(n_neighbors=1)\n",
    "\n",
    "# 使用训练集训练模型\n",
    "model.fit(datas,labels)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "id": "e9d3e5bd-2cab-49bd-bc30-fbcd9a4dc3e1",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The predicted label for the new data point is: B\n"
     ]
    }
   ],
   "source": [
    "def model1(a, b):\n",
    "    new_data = np.array([[a, b]])\n",
    "    new_labels = model.predict(new_data)\n",
    "\n",
    "    # 打印预测结果\n",
    "    print(\"The predicted label for the new data point is:\", new_labels[0])\n",
    "\n",
    "# 测试函数\n",
    "model1(3.5, 1.0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fa2341cf-fdb9-446c-8c49-148b2c145357",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "77a718c0-204e-41f2-b60e-591a7d46c4ab",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f48a0f0c-294f-4a4e-bc3f-6810a7e37118",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
